AITopics | mean train loss

Collaborating Authors

mean train loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Practical tradeoffs between memory, compute, and performance in learned optimizers

Metz, Luke, Freeman, C. Daniel, Harrison, James, Maheswaranathan, Niru, Sohl-Dickstein, Jascha

arXiv.org Artificial IntelligenceJul-16-2022

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and more memory efficient than previous work. Despite the huge computational costs associated with training large neural models, the set of optimization algorithms used to train them has largely been restricted to simple update functions mapping from gradients to parameter updates (e.g. These algorithms typically depend on a small number of hand-designed features and parameters. However, the last decade in machine learning research has repeatedly seen small, hand-designed models outperformed by parameterized models (such as neural networks) trained to purpose on large amounts of data (LeCun et al., 2015). Thus, a promising direction to improve training performance and reduce costs is to replace hand-designed optimizers with more expressive learned optimizers, trained on problems similar to those encountered in practice. Learned optimizers specify parameter update rules using a flexible parametric form and learn the parameters of this function from a "dataset" of optimization tasks--a procedure typically referred to as meta-training or meta-learning (Andrychowicz et al., 2016; Finn et al., 2017; Hochreiter et al., 2001). Learned optimizers represent a path towards improved optimizer performance, and possess the ability to target different objectives (e.g. Despite being an active area of research (Andrychowicz et al., 2016; Wichrowska et al., 2017; Chen et al., 2020; Metz et al., 2020b; 2021; Almeida et al., 2021; Zheng et al., 2022), they are not yet commonly used in practice. Several challenges have limited the widespread application of learned optimizers: they are typically difficult to meta-train on a task family of interest, they can require significant memory and compute overhead when applied, and they often generalize less well to novel tasks than hand-designed optimizers.

conv meta loss cifar10, mean train loss, optimizer, (10 more...)

arXiv.org Artificial Intelligence

2203.1186

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
North America > United States > Texas (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback